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We describe preliminary results from an effort to quantify the uncertainties in parton distribution 
functions and the resulting uncertainties in predicted physical quantities. The production cross 
section of the W boson is given as a first example. Constraints due to the full data sets of the CTEQ 
global analysis are used in this study. Two complementary approaches, based on the Hessian and the 
Lagrange multiplier method respectively, are outlined. We discuss issues on obtaining meaningful 
uncertainty estimates that include the effect of correlated experimental systematic uncertainties 
and illustrate them with detailed calculations using one set of precision DIS data. 
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1 INTRODUCTION 



Many measurements at the Tevatron rely on parton distribution functions (PDFs) for significant 
portions of their data analysis as well as the interpretation of their results. For example, in cross 
section measurements the acceptance calculation often relies on Monte Carlo (MC) estimates of 
the fraction of unobserved events. As another example, the measurement of the mass of the 
W boson depends on PDFs via the modeling of the production of the vector boson in MC. In 
such cases, uncertainties in the PDFs contribute, by necessity, to uncertainties on the measured 
quantities. Critical comparisons between experimental data and the underlying theory are often 
even more dependent upon the uncertainties in PDFs. The uncertainties on the production cross 
sections for W and Z bosons, currently limited by the uncertainty on the measured luminosity, 
are approximately 4%. At this precision, any comparison with the theoretical prediction inevitably 
raises the question: How "certain" is the prediction itself? 

A recent example of the importance of PDF uncertainty is the proper interpretation of the mea- 
surement of the high-^T jet cross-section at the Tevatron. When the first CDF measurement was 
published there was a great deal of controversy over whether the observed excess, compared to 
theory, could be explained by deviations of the PDFs, especially the gluon, from the conventionally 
assumed behavior, or could it be the first signal for some new physics [||. 

With the unprecedented precision and reach of many of the Run I measurements, understanding 
the implications of uncertainties in the PDFs has become a burning issue. During Run II (and 
later at LHC) this issue may strongly affect the uncertainty estimates in precision Standard Model 
studies, such as the all important VF-mass measurement, as well as the signal and background 
estimates in searches for new physics. 

In principle, it is the uncertainties on physical quantities due to parton distributions, rather than 
on the PDFs themselves, that is of primary concern. The latter are theoretical constructs which 
depend on the renormalization and factorization schemes; and there are strong correlations between 
PDFs of different flavors and from different values of x, which can compensate each other in the 
convolution integrals that relate them to physical cross-sections. On the other hand, since PDFs are 
universal, if we can obtain meaningful estimates of their uncertainties based on analysis of existing 
data, then the results can be applied to all processes that are of interest in the future. ||, Q 
One can attempt to assess directly the uncertainty on a specific physical prediction due to the full 
range of PDFs allowed by available experimental constraints. This approach will provide a more 
reliable estimate for the range of possible predictions for the physical variable under study, and 
may be the best course of action for ultra-precise measurements such as the mass of the W boson 
or the W production cross-section. However, such results are process-specific and therefore the 
analysis must be carried out for each case individually. 

Until recently, the attempts to quantify either the uncertainties on the PDFs themselves (via 
uncertainties on their functional parameters, for instance) or the uncertainty on derived quantities 
due to variations in the PDFs have been rather unsatisfactory. Two commonly used methods are: 
(1) Comparing the predictions obtained with different PDF sets, e.g., various CTEQ MRS |]] 
and GRV |?J sets; (2) Within a given global analysis effort, varying individual functional parameters 
ad hoc, within limits considered to be consistent with the existing data, e.g. Q. Neither method 
provides a systematic, quantitative measure of the uncertainties of the PDFs or their predictions. 
As a case in point, Fig. |l| shows how the calculated value of the cross section for W boson production 
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Figure 1: Predicted cross section for W boson production for various PDFs. 

at the Tevatron varies with a set of historical CTEQ PDFs as well as the most recent CTEQ || 
and MRST sets. Also shown are the most recent measurements from D0 and CDF^. 
While it is comforting to see that the predictions have remained within a narrow range, the variation 
observed cannot be characterized as a meaningful estimate of the uncertainty: (i) the variation 
with time reflects mostly the changes in experimental input to, or analysis procedure of, the global 
analyses; and (ii) the perfect agreement between the values of the most recent CTEQ5M10 and 
MRS99 sets must be fortuitous, since each group has also obtained other satisfactory sets which 
give rise to much larger variations of the W cross section. The MRST group, in particular has 
examined the range of this variation by setting a variety of parameters to some extreme values || . 
These studies are useful but can not be considered quantitative or definitive. What is needed are 
methods that explore thoroughly the possible variations of the parton distribution functions. 
It is important to recognize all potential sources of uncertainty in the determination of PDFs. 
Focusing on some of these, while neglecting significant others, may not yield practically useful 
results. Sources of uncertainty are listed below: 

• Statistical uncertainties of the experimental data used to determine the PDFs. These vary 
over a wide range among the experiments used in a global analysis, but are straightforward to 
treat. 

• Systematic uncertainties within each data set. There are typically many sources of exper- 
imental systematic uncertainty, some of which are highly correlated. These uncertainties can be 
treated by standard methods of probability theory provided they are precisely known, which un- 
fortunately is often not the case - either because they may not be randomly distributed and/or 
because their estimation in practice involves subjective judgements. 

• Theoretical uncertainties arising from higher-order PQCD corrections, resummation correc- 
tions near the boundaries of phase space, power-law (higher twist) and nuclear target corrections, 

1 It is interesting to note that much of the difference between the D0 and CDF W cross sections is due to the 
different values of the total pp cross sections used 

2 CTEQ5M1 is an updated version of CTEQ5M differing only in a slight improvement in the QCD evolution (cf. 
note added in proof of ||). The differences are completely insignificant for our purposes. Henceforth, we shall refer 
to them generically as CTEQ5M. Both sets can be obtained from the web address http:/ /cteq.org/. 



2 



etc. 

• Uncertainties due to the parametrization of the non-perturbative PDFs, f a (x, Qo), at some 
low momentum scale Qo- The specific choice of the functional form used at Qo introduces implicit 
correlations between the various x-ranges, which could be as important, if not more so, than the 
experimental correlations in the determination of f a (x,Q) for all Q. 

Since strict quantitative statistical methods are based on idealized assumptions, such as random 
measurement uncertainties, an important trade-off must be faced in devising a strategy for the 
analysis of PDF uncertainties. If emphasis is put on the "rigor" of the statistical method, then 
many important experiments cannot be included in the analysis, either because the published 
errors appear to fail strict statistical tests or because data from different experiments appear to be 
mutually exclusive in the parton distribution parameter space § . If priority is placed on using the 
maximal experimental constraints from available data, then standard statistical methods may not 
apply, but must be supplemented by physical considerations, taking into account experimental and 
theoretical limitations. We choose the latter tack, pursuing the determination of the uncertainties 
in the context of the current CTEQ global analysis. In particular, we include the same body of 
the world's data as constraints in our uncertainty study as that used in the CTEQ5 analysis; and 
adopt the "best fit" - the CTEQ5M1 set - as the base set around which the uncertainty studies are 
performed. In practice, there are unavoidable choices (and compromises) that must be made in the 
analysis. (Similar subjective judgements often are also necessary in estimating certain systematic 
errors in experimental analyses.) The most important consideration is that quantitative results 
must remain robust with respect to reasonable variations in these choices. 

In this Report we describe preliminary results obtained by our group using the two approaches 
mentioned earlier. In Section 3 we focus on the error matrix, which characterizes the general 
uncertainties of the non-perturbative PDF parameters. In Sections 4 and 5 we study specifically 
the production cross section aw for bosons at the Tevatron, to estimate the uncertainty of the 
prediction of aw due to PDF uncertainty. We start in Section 2 with a review of some aspects of 
the CTEQ global analysis on which this study is based. 

2 Elements of the Base Global Analysis 

Since our strategy is based on using the existing framework of the CTEQ global analysis, it is useful 
to review some of its features pertinent to the current study [||. 

Data selection: Table |l| shows the experimental data sets included in the CTEQ5 global analysis, 
and in the current study. For neutral current DIS data only the most accurate proton and deuteron 
target measurements are kept, since they are the "cleanest" and they are already extremely ex- 
tensive. For charged current (neutrino) DIS data, the significant ones all involve a heavy (Fe) 
target. Since these data are crucial for the determination of the normalization of the gluon distri- 
bution (indirectly via the momentum sum rule), and for quark flavor differentiation (in conjunction 
with the neutral current data), they play an important role in any comprehensive global analysis. 
For this purpose, a heavy-target correction is applied to the data, based on measured ratios for 
heavy-to-light targets from NMC and other experiments. Direct photon production data are not 
included because of serious theoretical uncertainties, as well as possible inconsistencies between 



3 



Process 


Experiment 


Measurable 


Ndata 


DIS 


BCDMS 


10| 


Z ri~ ZD 


324 




NMC [11 




I ti> ZD 


240 




hi m 


F e 

Z /7 


172 




ZEUS @ 


Z ri 


186 




CCFR [ 


14 


1 


£i c c; ' o x o 


174 


Drell-Yan 


E605[15j 


sda / dyjrdy 


119 




E866 0] 


a(pd)/2a(pp) 


11 




NA-51|17 


] 


A DY 


1 


W-prod. 


CDF || 


Lepton asym. 


11 


Incl. Jet 


CDF @ 


dajdE t 


33 




D0[20 




da/dEt 


24 



Table 1: List of processes and experiments used in the CTEQ5M Global analysis. The total number 
of data points is 1295. 

existing experiments. Cf. [||] and ||. The combination of neutral and charged DIS, lepton-pair 
production, lepton charge asymmetry, and inclusive large-py jet production processes provides a 
fairly tightly constrained system for the global analysis of PDFs. In total, there are ~1300 data 
points which meet the minimum momentum scale cuts which must be imposed to ensure that 
PQCD applies. The fractional uncertainties on these points are distributed roughly like dF/F over 
the range F = 0.003 - 0.4. 

Parametrization: The non-perturbative parton distribution functions f a (x, Q) at a low momen- 
tum scale Q = Qo are parametrized by a set of functions of x, corresponding to the various flavors 
a. For this analysis, Qo is taken to be 1 GeV. The specific functional forms and the choice of Qo 
are not important, as long as the parametrization is general enough to accommodate the behavior 
of the true (but unknown) non-perturbative PDFs. The CTEQ analysis adopts the functional form 

a x ai (l -x) a2 (l + a 3 x a4 ). 

for most quark flavors as well as for the gluon.^] After momentum and quark number sum rules 
are enforced, there are 18 free parameters left over, hereafter referred to as "shape parameters" 
{aj}. The PDFs at Q > Qq are determined from f a (x,Qo) by evolution equations from the 
renormalization group. 

Fitting: The values of {a.;} are determined by fitting the global experimental data to the theo- 
retical expressions which depend on these parameters. The fitting is done by minimizing a global 
"chi-square" function, Xgiobai- The quotation mark indicates that this function serves as a figure 
of merit of the quality of the global fit; it does not necessarily have the full significance associated 

3 An exception is that recent data from E866 seem to require the ratio d/u to take a more unconventional functional 
form. 
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with rigorous statistical analysis, for reasons to be discussed extensively throughout the rest of this 
report. In practice, this function is defined as: 

Xglobal = Y,12 m 
n i 

+ £[(1-Ar n )/^] 2 (1) 

n 

where d n i , a^i ; an d tni denote the data, measurement uncertainty, and theoretical value (dependent 
on {aj}) for the i th data point in the n th experiment. The second term allows the absolute normal- 
ization (N n ) for each experiment to vary, constrained by the published normalization uncertainty 
(cr„ ). The w n factors are weights applied to some critical experiments with very few data points, 
which are known (from physics considerations) to provide useful constraints on certain unique fea- 
tures of PDFs not afforded by other experiments. Experience shows that without some judiciously 
chosen weights, these experimental data points will have no influence in the global fitting process. 
The use of these weighing factors, to enable the relevant unique constraints, amounts to imposing 
certain prior probability (based on physics knowledge) to the statistical analysis. 
In the above form, Xgiobai i ncm des for each data point the random statistical uncertainties and 
the combined systematic uncertainties in uncorrelated form, as presented by most experiments in 
the published papers. These two uncertainties are combined in quadrature to form a^i m Eq. |j. 
Detailed point to point correlated systematic uncertainties are not available in the literature in 
general; however, in some cases, they can be obtained from the experimental groups. For global 
fitting, uniformity in procedure with respect to all experiments favors the usual practice of merging 
them into the uncorrelated uncertainties. For the study of PDF uncertainties, we shall discuss this 
issue in more detail in Section ||. 

Goodness-of-fit for CTEQ5M: Without going into details, Fig. ^ gives an overview of how 
well CTEQ5m fits the total data set. The graph is a histogram of the variable x = (d — t)/a where 
d is a data value, a the uncertainty of that measurement (statistical and systematic combined), and 
t the theoretical value for CTEQ5m. The curve in Fig. |2| has no adjustable parameters; it is the 
Gaussian with width 1 normalized to the total number of data points (1295). Over the entire data 
set, the theory fits the data within the assigned uncertainties <t^, indicating that those uncertainties 
are numerically consistent with the actual measurement fluctuations. Similar histograms for the 
individual experiments reveal various deviations from the theory, but globally the data have a 
reasonable Gaussian distribution around CTEQ5M. 
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3 Uncertainties on PDF parameters: The Error Matrix 

We now describe results from an investigation of the behavior of the Xgiobal function at its minimum, 
using the standard error matrix approach [21]. This allows us to determine which combinations of 
parameters are contributing the most to the uncertainty. 

At the minimum of x|i bal' ^ ne ^ rs ^ derivatives with respect to the {oj} are zero; so near the 
minimum, Xgiobai can ^ e approximated by 
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Figure 2: Histogram of the (measurement — theory) for all data points in the CTEQ5m fit. 



Xgiobai = xl + g Fi ' !Ji!J J ( 2 ) 

where y% = a{ — clqi is the displacement from the minimum, and F^j is the Hessian, the matrix of 
second derivatives. It is natural to define a new set of coordinates using the complete orthonormal 
set of eigenvectors of the symmetric matrix Fu as basis vectors. These vectors can be ordered 
by their eigenvalues ej. Each eigenvalue is a quantitative measure of the uncertainties in the 
shape parameters {aj} for displacements in parameter space in the direction of the corresponding 
eigenvector. The quantity t{ = \j yje[ is the distance in the 18 dimensional parameter space, in the 
direction of eigenvector i, that makes a unit increase in Xgiobai- ^ the om y measurement uncertainty 
were uncorrelated gaussian uncertainties, then £j would be one standard deviation from the best 
fit in the direction of the eigenvector. The inverse of the Hessian is the error matrix. 
Because the real uncertainties, for the wide variety of experiments included, are far more compli- 
cated than assumed in the ideal situation, the quantitative measure of a given increase in x 2 g i bai 
carries little true statistical meaning. However, qualitatively, the Hessian gives an analytic picture 
of Xgiobai near minimum in {oj} space, and hence allows us to identify the particular degrees of 
freedom that need further experimental input in future global analyses. 

From calculations of the Hessian we find that the eigenvalues vary over a wide range. Figure |3] 
shows a graph of the eigenvalues of Fij, on a logarithmic scale. The vertical axis is li = l/^/e7, the 
distance of a "standard deviation" along the i th eigenvector. These distances range over 3 orders of 
magnitude. Large eigenvalues of F^ correspond to "steep directions" of Xgiobai- The corresponding 
eigenvectors are combinations of shape parameters that are well determined by current data. For 
example, parameters that govern the valence u and d quarks at moderate x are sharply constrained 
by DIS data. Small eigenvalues of F^ correspond to "flat directions" of Xgiobai- ^ n ^ ne direc- 
tions of these eigenvectors, Xgiobai changes little over large distances in {aj} space. For example, 
parameters that govern the large- x behavior of the gluon distribution, or differences between sea 
quarks, properties of the nucleon that are not accurately determined by current data, contribute 
to the flat directions. The existence of flat directions is inevitable in global fitting, because as the 
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Figure 3: Plot of the eigenvalues of the Hessian. The vertical axis is l{ = \j ^fel. 

data improve it only makes sense to maintain enough flexibility for f a (%,Qo) to fit the available 
experimental constraints. 

Because the eigenvalues of the Hessian have a large range of values, efficient calculation of 
requires an adaptive algorithm. In principle Fy is the matrix of second derivatives at the min- 
imum of Xgiobal' wn i cn could be calculated from very small finite differences. In practice, small 
computational errors in the evaluation of Xgiobal preclude the use of a very small step size. Coarse 
grained finite differences yield a more accurate calculation of the second derivatives. But because 
the variation of Xgiobal var i es markedly in different directions, it is important to use a grid in {oj} 
space with small steps in steep directions and large steps in flat directions. This grid is generated 
by an iterative procedure, in which Fij converges to a good estimate of the second derivatives. 
From calculations of Fij we find that the minimum of Xgiobal ^ s fairly quadratic over large distances 
in the parameter space. Figures |] and |5| show the behavior of Xgiobal n ear the minimum along 

X 2 along eigenvectors #1—6 
(E from eps=0.05 for 5mN31) 
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Figure 4: Value of x 2 along the six eigenvectors with the largest eigenvalues. 

each of the 18 eigenvectors. Xgiobal I s plotted on the vertical axis, and the variable on the horizontal 
axis is the distance in {a^} space in the direction of the eigenvector, in units of l{ = \j ^fei. There 
is some nonlinearity, but it is small enough that the Hessian can be used as an analytic model of 
the functional dependence of Xgiobal on the shape parameters. 

In a future paper we will provide details on the uncertainties of the original shape parameters 
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X 2 along eigenvectors # 7-18 
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Figure 5: Value of x 2 along the 12 eigenvectors with the smallest eigenvalues. 

{ctj}. But it should be remembered that these parameters specify the PDFs at the low Q scale, 
and applications of PDFs to Tevatron experiments use PDFs at a high Q scale. The evolution 
equations determine f(x,Q) from f(x,Qo), so the functional form at Q depends on the {a^} in a 
complicated way. 

4 Uncertainty on aw- the Lagrange Multiplier Method 

In this Section, we determine the variation of Xgiobai as a function of a single measurable quantity. 
We use the production cross section for W bosons {cry/) as an archetype example. The same method 
can be applied to any other physical observable of interest, for instance the Higgs production cross 
section, or to certain measured differential distributions. The aim is to quantify the uncertainty on 
that physical observable due to uncertainties of the PDFs integrated over the entire PDF parameter 
space. 

Again, we use the standard CTEQ5 analysis tools and results || as the starting point. The "best 
fit" is the CTEQ5M1 set. A natural way to find the limits of a physical quantity X, such as aw 
at y/s = 1.8 TeV, is to take X as one of the search parameters in the global fit and study the 
dependence of Xgiobai f° r * ne ^ Dase experimental data sets on X. 

Conceptually, we can think of the function Xgiobai t na t is minimized in the fit as a function of 
oi, . . . , ai7, X instead of oi, . . . , a±s- This idea could be implemented directly in principle, but a 
more convenient way to do the same thing in practice is through Lagrange's method of undetermined 
multipliers. One minimizes, with respect to the {at}, the quantity 

F W = Xgiobai + XX(ai,...,a 18 ) (3) 
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for a fixed value of A, the Lagrange multiplier. By minimizing F(X) for many values of A, we map 
out Xgiobai as a function of X. The minimum of F for a given value of A is the best fit to the data 
for the corresponding value of X, i.e., evaluated at the minimum. 

Fi gure ^ shows Xgiobai th.6 15 base Gxporimentcil dcitct sets cis el function of aw at the Tevatron. 
The horizontal axis is aw times the branching ratio for W — ► leptons, in nb. The CTEQ5m 
prediction is aw ■ BR\ ep = 2.374 nb. The vertical dashed lines are ±3% and ±5% deviations from 

W-production X-sec. at the Tevatron 
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Figure 6: x 2 of the base experimental data sets versus aw • BR\ ep , the W production cross-section 
at the Tevatron times lepton branching ratio, in nb. 

the CTEQ5m prediction. 

The two parabolas associated with points in Fig. || correspond to different treatments of the normal- 
ization factor N n in Eq. ||. The dots are variable norm fits, in which N n is allowed to float, taking 
into account the experimental normalization uncertainties, and F(X) is minimized with respect 
to N n . The justification for this procedure is that overall normalization is a common systematic 
uncertainty. The square points are fixed norm fits, in which all N n are held fixed at their values 
for the global minimum (CTEQ5m). These two procedures represent extremes in the treatment of 
normalization uncertainty. The parabolas are just least-square fits to the points in the two cases. 
The other curve in Fig. |6] was calculated using the Hessian method. The Hessian Fij is the matrix 
of second derivatives of Xgiobai wr th respect to the shape parameters {a,}. The derivatives (first 
and second) of aw may also be calculated by finite differences. Using the resultant quadratic 
approximations for Xgi bai( a ) an< ^ one may minimize Xgiobai w ith aw fixed. Since this 

calculation keeps the normalization factors constant, it should be compared with the fixed norm 
fits from the Lagrange multiplier method. The fact that the Hessian and Lagrange multiplier 
methods yield similar results lends support to both approaches; the small difference between them 
indicates that the quadratic functional approximations for Xgiobai an( ^ a w are onr y approximations. 
For the quantitative analysis of uncertainties, the important question is: How large an increase 
in Xgiobai snou ld be taken to define the likely range of uncertainty in XI There is an elementary 
statistical theorem that states that A% 2 = 1 in a constrained fit corresponds to 1 standard deviation 
of the constrained quantity X. However, the theorem relies on the assumption that the uncertainties 
are gaussian, uncorrelated, and correctly estimated in magnitude. Because these conditions do not 
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hold for the full data set (of ~ 1300 points from 15 different experiments), this theorem cannot be 
naively applied quantitatively^] Indeed, it can be shown that, if the measurement uncertainties are 
correlated, and the correlation is not properly taken into account in the definition of Xgiobai > then 
a standard deviation may vary over the entire range from Ax 2 = 1 to Ax 2 = ./V (the total number 
of data points - ~ 1300 in our case). 



5 STATISTICAL ANALYSIS WITH SYSTEMATIC UNCERTAIN- 
TIES 

Fig. H shows how the fitting function Xgiobai increases from its minimum value, at the best global fit, 
as the cross-section aw for W production is forced away from the prediction of the global fit. The 
next step in our analysis of PDF uncertainty is to use that information, or some other analysis, 
to estimate the uncertainty in aw- In ideal circumstances we could say that a certain increase 
of Xgiobai f rom the minimum value, call it Ax 2 , would correspond to a standard deviation of the 
global measurement uncertainty. Then a horizontal line on Fig. |6| at Xmin + ^X 2 would indicate 
the probable range of aw, by the intersection with the parabola of Xgiobai versus °~w- 
However, such a simple estimate of the uncertainty of aw is not possible, because the fitting function 
Xgiobai does not include the correlations between systematic uncertainties. The uncertainty a^ in 
the definition (|l]) of Xgiobai combines in quadrature the statistical and systematic uncertainties 
for each data point; that is, it treats the systematic uncertainties as uncorrelated. The standard 
theorems of statistics for Gaussian probability distributions of random uncertainties do not apply 
to x 2 



Instead of using xgiobai to estimate confidence levels on aw, we believe the best approach is to 



global ' 

global 

carry out a thorough statistical analysis, including the correlations of systematic uncertainties, 
on individual experiments used in the global fit for which detailed information is available. We 



will describe here such an analysis for the measurements of F2(x, Q) by the HI experiment [12] at 
HERA, study. In a future paper, we will present similar calculations for other experiments. 

The HI experiment has provided a detailed table of measurement uncertainties - statistical and 



systematic - for their measurements of F2(x,Q). [12] The CTEQ program uses 172 data points 



from HI (requiring the cut Q 2 > 5GeV 2 ). For each measurement dj (where j = 1 . . . 172) there is 
a statistical uncertainty o"oj, an uncorrelated systematic uncertainty a\j, and a set of 4 correlated 
systematic uncertainties Oj& where k = 1 ... 4. (In fact there are 8 correlated uncertainties listed 
in the HI table. These correspond to 4 pairs. Each pair consists of one standard deviation in the 
positive sense, and one standard deviation in the negative sense, of some experimental parameter. 
For this first analysis, we have approximated each pair of uncertainties by a single, symmetric 
combination, equal in magnitude to the average magnitude of the pair.) 

To judge the uncertainty of aw, as constrained by the HI data, we will compare the HI data to the 
global fits in Fig. || The comparison is based on the true, statistical x 2 ) including the correlated 

4 It has been shown by Giele et.al. Q, that, taken literally, only one or two selected experiments satisfy the 
standard statistical tests. 
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Table 2: Comparison of HI data to the PDF fits with constrained values of aw- 
uncertainties, which is given by 

j i kk' 

The index j labels the data points and runs from 1 to 172. The indices k and k! label the source 
of systematic uncertainty and run from 1 to 4. The combined uncorrelated uncertainty dj is 
J (jQj + a\- . The second term in (||) comes from the correlated uncertainties. is the vector 

^ = E (dj ~2 W , (5) 

and Akk' is the matrix 

Akk' = 3kk' + — 2 k ■ (6) 
j i 

Assuming the published uncertainties aoj, o\j and djk accurately reflect the measurement fluctu- 
ations, x 2 would obey a chi-square distribution if the measurements were repeated many times. 
Therefore the chi-square distribution with 172 degrees of freedom provides a basis for calculating 
confidence levels for the global fits in Fig. ||. 

Table Q shows \ 2 f° r the HI data compared to seven of the PDF fits in Fig. ^. The center row of the 
Table is the global best fit - CTEQ5m. The other rows are fits obtained by the Lagrange multiplier 
method for different values of the Lagrange multiplier. The best fit to the HI data, i.e., the smallest 
X 2 , is not CTEQ5m (the best global fit) but rather the fit with Lagrange multiplier 1000 for which 
aw is 0.8% smaller than the prediction of CTEQ5m. Forcing the W cross section values away 
from the prediction of CTEQ5m causes an increase in x 2 for the DIS data. At yfs = 1.8 TeV, W 
production is mainly from qq — > W + W~ with moderate values of x for q and q, i.e., values in the 
range of DIS experiments. Forcing aw higher (or lower) requires a higher (or lower) valence quark 
density in the proton, in conflict with the DIS data, so x 2 increases. 

The final column in Table ^ labeled "probability" , is computed from the chi-square distribution 
with 172 degrees of freedom. This quantity is the probability for x 2 to be greater than the value 
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calculated from the existing data, if the HI measurements were to be repeated. So, for example, the 
fit with Lagrange multiplier —3000, which corresponds to aw being 3.2% larger than the CTEQ5m 
prediction, has probability 0.092. In other words, if the HI measurements could be repeated many 
times, in only 9.2% of trials would x 2 be greater than or equal to the value that has been obtained 
with the existing data. This probability represents a confidence level for the value of aw that was 
forced on the PDF by setting the Lagrange multiplier equal to -3000. At the 9.2% confidence level 
we can say that aw ■ BR\ ep is less than 2.450 nb, based on the HI data. Similarly, at the 21.2% 
confidence level we can say that aw ■ BRi ep is greater than 2.294 nb. 




2.25 2.3 2.35 2.4 2.45 2.5 
a w BRi e p (nb) 



Figure 7: x 2 /N of the HI data, including error correlations, compared to PDFs obtained by the 
Lagrange multiplier method for constrained values of aw- 

Fig. |7| is a graph of x 2 /N f° r the HI data compared to the PDF fits in Table |2[ This figure may 
be compared to Fig. ^. The CTEQ5 prediction of the W production cross-section is shown as an 
arrow, and the vertical dashed lines are ±3% away from the CTEQ5m prediction. The horizontal 
dashed line is the 68% confidence level on x 2 '/N for N = 172 degrees of freedom. The comparison 
with HI data alone indicates that the uncertainty on aw is ~ 3%. 

There is much more to say about x 2 an d confidence levels. In a future paper we will discuss 
statistical calculations for other experiments in the global data set. The HI experiment is a good 
case, because for HI we have detailed information about the correlated uncertainties. But it may 
be somewhat fortuitous that the x 2 P er data point for CTEQ5m is so close to 1 for the HI data 
set. In cases where x 2 /N is not close to 1, which can easily happen if the estimated systematic 
uncertainties are not textbook-like, we must supply further arguments about confidence levels. For 
experiments with many data points, like 172 for HI, the chi-square distribution is very narrow, so 
a small inaccuracy in the estimate of aj may translate to a large uncertainty in the calculation 
of confidence levels based on the absolute value of x 2 ■ Because the estimation of experimental 
uncertainties introduces some uncertainty in the value of x 2 ■, it is n °t really the absolute value of 
X 2 that is important, but rather the relative value compared to the value at the global minimum. 
Therefore, we might study ratios of x 2 's to interpret the variation of x 2 with aw- 

6 CONCLUSIONS 

It has been widely recognized by the HEP community, and it has been emphasized at this workshop, 
that PDF phenomenology must progress from the past practice of periodic updating of representa- 
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five PDF sets to a systematic effort to map out the uncertainties, both on the PDFs themseives and 
on physicai observabies derived from them. For the analysis of PDF uncertainties, we have only 
addressed the issues related to the treatment of experimental uncertainties. Equally important for 
the ultimate goal, one must come to grips with uncertainties associated with theoretical approx- 
imations and phenomenological parametrizations. Both of these sources of uncertainties induce 
highly correlated uncertainties, and they can be numerically more important than experimental 
uncertainties in some cases. Only a balanced approach is likely to produce truly useful results. 
Thus, great deal of work lies ahead. 

This report described first results from two methods for quantifying the uncertainty of parton 
distribution functions associated with experimental uncertainties. The specific work is carried out 
as extensions of the CTEQ5 global analysis. The same methods can be applied using other parton 
distributions as the starting point, or using a different parametrization of the non-perturbative 
PDFs. We have indeed tried a variety of such alternatives. The results are all similar to those 
presented above. The robustness of these results lends confidence to the general conclusions. 
The Hessian, or error matrix method reveals the uncertainties of the shape parameters used in the 
functional parametrization. The behavior of Xgiobai ^ n the neighborhood of the minimum is well 
described by the Hessian if the minimum is quadratic. 

The Lagrange multiplier method produces constrained fits, i.e., the best fits to the global data set 
for specified values of some observable. The increase of Xgiobai' as ^ ne observable is forced away 
from the predicted value, indicates how well the current data on PDFs determines the observable. 
The constrained fits generated by the Lagrange multiplier method may be compared to data from 
individual experiments, taking into account the uncertainties in the data, to estimate confidence 
levels for the constrained variable. For example, we estimate that the uncertainty of aw attributable 
to PDFs is ±3%. 

Further work is needed to apply these methods to other measurements, such as the W mass or the 
forward-backward asymmetry of W production in pp collisions. Such work will be important for 
analyzing high precision measurements involving the electroweak bosons in the Tevatron Run II 
era. 
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